An Automatically Aligned Corpus of Child-Directed Speech

نویسندگان

  • Micha Elsner
  • Kiwako Ito
چکیده

Forced alignment would enable phonetic analyses of child directed speech (CDS) corpora which have existing transcriptions. But existing alignment systems are inaccurate due to the atypical phonetics of CDS. We adapt a Kaldi forced alignment system to CDS by extending the dictionary and providing it with heuristically-derived hints for vowel locations. Using this system, we present a new time-aligned CDS corpus with a million aligned segments. We manually correct a subset of the corpus and demonstrate that our system is 70% accurate. Both our automatic and manually corrected alignments are publically available at osf.io/ke44q.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words

The impressive ability of children to acquire language is a widely studied phenomenon, and the factors influencing the pace and patterns of word learning remains a subject of active research. Although many models predicting the age of acquisition of words have been proposed, little emphasis has been directed to the raw input children achieve. In this work we present a comparatively large-scale ...

متن کامل

A corpus of European Portuguese child and child-directed speech

We present a corpus of child and child-directed speech of European Portuguese. This corpus results from the expansion of an already existing database (Santos, 2006). It includes around 52 hours of child-adult interaction and now contains 27,595 child utterances and 70,736 adult utterances. The corpus was transcribed according to the CHILDES system (Child Language Data Exchange System) and using...

متن کامل

Characterizing Motherese: On the Computational Structure of Child-Directed Language

We report a quantitative analysis of the cross-utterance coordination observed in child-directed language, where successive utterances often overlap in a manner that makes their constituent structure more prominent, and describe the application of a recently published unsupervised algorithm for grammar induction to the largest available corpus of such language, producing a grammar capable of ac...

متن کامل

A Longitudinal Study of Prosodic Exaggeration in Child - directed Speech 194

We investigate the role of prosody in child-directed speech of three English speaking adults using data collected for the Human Speechome Project, an ecologically valid, longitudinal corpus collected from the home of a family with a young child. We looked at differences in prosody between child-directed and adult-directed speech. We also looked at the change in prosody of child-directed speech ...

متن کامل

A longitudinal study of prosodic exaggeration in child- directed speech

We investigate the role of prosody in child-directed speech of three English speaking adults using data collected for the Human Speechome Project, an ecologically valid, longitudinal corpus collected from the home of a family with a young child. We looked at differences in prosody between child-directed and adult-directed speech. We also looked at the change in prosody of child-directed speech ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017